Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

نویسندگان

  • Zhiyong Wu
  • Lianhong Cai
  • Helen M. Meng
چکیده

This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly from audio features; Sigma-Pi network sampling method is also incorporated to reduce feature dimensions. Experiments on the homegrown Chinese database and CMU English database both demonstrate that the method improves the accuracies of audio-visual bimodal speaker identification under dynamically varying acoustic noise conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-level Fusion of Audio and Visual Features for Speaker Identification

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchroni...

متن کامل

Missing Reliability Correction in Modality Information Integration for Robust Speaker Identification

In the emerging biometrics technology, speaker identification in real environment is one of the key issues for enhancing the density of human computer interaction. In this paper, we propose an optimizing factor through a fuzzy membership function for correcting the reliability in different modalities reliability measure in a bimodal fusion process for speaker identification. In the bimodal spea...

متن کامل

Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification

This work aims to understand and model the inter-modal temporal relations between the audio and visual modalities of speech and validate whether the captured relations can improve the performance of audio-visual bimodal modeling for such applications as audio-visual speaker identification. We propose to extend our audio-visual correlative model (AVCM) with explicit durational modeling of the pa...

متن کامل

The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs

This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speakerdependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based arou...

متن کامل

Likelihood Ratio Based Score Fusion for Audio-Visual Speaker Identification in Challenging Environment

It is well known to enhance the performance of noise robust speaker identification using visual speech information with audio utterances. This paper presents an approach to evaluate the performance of a noise robust audio-visual speaker identification system using likelihood ratio based score fusion in challenging environment. Though the traditional HMM based audio-visual speaker identification...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006